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I.  RESEARCH  OBJECTIVES 
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Considering  the  increasing  demands  on  scientists  and  engineers  to  model, 
solve  and  analyze  larger  and  more  complex  problems,  the  need  for  multiprocessing 
and/or  vectori zation  of  the  numerical  schemes  being  employed  is  substantial 
if  significant  time-to-solution  reductions  are  to  be  achieved.  It  is  not 
sufficient  merely  to  implement  old  algorithms  in  a  parallel  processing  environ¬ 
ment.  '  New  fast  algorithms  for  high  speed  computation  on  the  modern  generation 
of  supercomputers  is  essential.  In  order' to  meet  these  challenges  the  research 
objectives  of  this  AFOSR  project  are  to  develop  niew  techniques  in  numerical 
linear  algebra  and  its  applications  for  implementation  on  these  new 
architectures.  Significantly,  applications  of  mtr  work  to  practical  problems 
of  structural  analysis  and  design  and  to  least  squares  adjustments,  estimation 
and  digital  filtering  are  also  being  investigated. 

Our  current  objectives  in  structural  analysis  are  to  develop  efficient 
and  stable  high  speed  algorithms  for  the  oesign  and  analysis  of  large  complex 
systems.  Ot^  interest  here  is  in  developing  stable  alternatives  to  the  often 
i 1 1 rconditioned  stiffness  matrix  approach  to  solving  problems  in  elastic 
analysis  and  structural  dynamics.  For  example,  the  principal  investigator 
and  Michael  Berry  at  the  University  of  Illinois  Center  for  Suf^rcomputer 
Research  and  Development  are  developing' a  comparative  study ^of  the  performances 
of  seven  alternative  methods  to  the  stiffness  approach  on  the  Alliant  FX/8 
and  Cray  X-MP  systems.  These  methods  involve  various  orthogonal  factorization 
approaches  as  well  as  preconditioned  conjugate  gradient  methods  which  completely 
avoid  formation  of  the  stiffness  equations.  The  results  of  this  study  will 
be  presented  this  Fall  as  an  invited  paper  at  the  World  Congress  on  Com¬ 
putational  Mechanics. 

Our  work  involving  least  squares  problems  has  several  objectives.  We 
wish  to  implement  and  test  a  recent  parallel  block  scheme  by  Golub,  Sameh 
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and  the  principal  investigator,  on  the  Alliant  FX/8  multiprocessor.  We  have 
in  mind  here  large  scale  geodetic  adjustment  computations.  We  are  also  imple¬ 
menting  and  testing  new  conjugate  gradient  type  algorithms  by  Barlow,  Nichols 
and  the  principal  investigator  on  the  Alliant.  A  new  quadratical ly  convergent 
parallel  algorithm  based  upon  Newton's  method,  by  Wright  and  the  principal 
investigator,  has  just  been  implemented  don  the  Alliant.  Our  final  least 
squares  objective  during  this  period  has  been  to  complete  the  error  analysis  and 
testing  of  a  recursive  least  squares  hyperbolic  rotation  algorithm  for  signal 
processing.  This  is  joint  work  with  a  Ph.D.  student,  C.  Pan,  and  with  T. 
Alexander  from  the  NCSU  Department  of  Electrical  and  Computer  Engineering. 
Our  scheme  is  amenable  to  implementation  on  a  variety  of  vector  and  parallel 
processing  systems,  such  as  the  Alliant. 

Some  major  results  obtained  during  the  past  year  of  this  project  are 
outlined  in  the  next  section. 

II.  SUMMARY  OF  MAJOR  RESULTS 


Our  most  important  research  accomplishments  during  the  past  year  are 
briefly  described  below.  These  results  have  been  obtained  on  four  general 
problems  in  numerical  linear  algebra  and  its  applications.  Preprints  detailing 
this  work  have  been  provided  to  the  AFOSR. 

1.  Parallel  Multisplitting  Iterative  Methods  (Joint  with  M.  Neumann) 

Despite  the  major  activity  recently  on  parallel  processing,  relatively 
few  effective  new  algorithms  designed  exclusively  for  multiprocessors 
have  been  put  forth.  One  such  new  algorithm  is  the  multi spl itting  iterative 
algorithm  suggested  by  O'Leary  and  White.  Although  O'Leary  and  White, 
and  later  White,  have  given  some  sufficient  conditions  for  convergence, 
a  general  convergence  theory  has  not  been  developed  even  for  the  classical 
situation  where  the  coefficient  matrix  A  is  an  M-matrix  or  is  symmetric 
positive  definite. 
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Our  purpose  in  this  paper  is  to  study  the  M-matrix  case  in  detail. 

The  multisplitting  process  'or  A  e  is  recast  as  an  ordinary  iterative 

kn  kn 

process  for  a  certain  block  matrix  A  e  R  "  ,  where  k  is  the  number 
of  processors,  and  standard  convergence  results  are  used  to  develop  a 
convergence  theory  for  multisplitting  iterative  methods  where  A  is  an 
M-matrix. 

Comparison  results  between  multi  splitting  methods  are  established 
in  terms  of  montonic  norms  and,  for  the  case  where  A  is  irreducible, 
in  terms  of  the  asymptotic  convergence  rate.  A  key  observation  here 
is  that  in  certain  cases  the  rate  of  global  convergence  of  these  paral lei 
iterative  methods  is  inherent  in  the  splitting  of  A  and  is  independent 
of  the  manner  in  which  the  work  is  distributed  among  the  processors. 
Thus  in  general  one  can  distribute  the  work  for  load  balancing  purposes 
without  affecting  the  convergence  rate. 

2.  Conjugate  Gradient  Method  for  Equality  Constrained  Least  Squares 


with  Applications  to  Structural  Analysis  (Joint  with  J.  Barlow  and 
N.  Nichols) 

A  preconditioned  conjugate  gradient  algorithm  has  been  developed 
for  solving  large  scale  least  squares  problems  iwth  equality  constraints. 
The  method  has  been  implemented,  tested  and  compared  with  other  methods 
on  the  Alliant  FX/8  and  Cray  X-MP  systems  for  solving  large  scale  problems 
in  structural  optimization  and  design.  This  method  can  be  applied  to 
both  full  rank  and  rank  deficient  applications  in  structural  analysis. 
Comparisons  with  various  other  approaches  including  a  recent  weighting 
method  by  Van  Loan  are  made  on  a  testbed  of  structural  analysis  data. 
3.  Parallel  Block  Schemes  for  Large  Scale  Least  Squares  Computations 


(Joint  with  G.  Golub  and  A.  Sameh) 

Large  scale  least  squares  computations  arise  in  a  variety  of  scientific 
and  engineering  problems,  including  geodetic  adjustments  and  surveys, 
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medical  image  analysis,  molecular  structures,  partial  differential  equations 
and  substructuring  methods  in  structural  engineering.  In  each  of  these 
problems,  matrices  often  arise  which  possess  a  block  structure  which 
reflects  the  local  connection  nature  of  the  underlying  physical  problem. 
For  example,  super-large  nonlinear  least  squares  computations  currently 
arise  in  geodesy.  Here  the  coordinates  of  positions  are  calculated  by 

iteratively  solving  overdetermined  systems  of  nonlinear  equations  by 
the  Gauss-Newton  method.  The  U.S.  National  Geodetic  Survey  will  complete 

this  year  (1986)  the  readjustment  of  the  North  American  Datum,  a  problem 
which  involves  over  540  thousand  unknowns  and  over  6.5  million  observations 
(equations).  The  observation  matrix  for  the  least  squares  computations 
has  a  block  angular  form  with  161  diagonal  blocks,  each  containing  3 
to  4  thousand  unknowns.  In  this  paper  parallel  schemes  are  suggested 

for  the  orthogonal  factorization  of  matrices  in  block  angular  form  and 
for  the  associated  backsubstitution  phase  of  the  least  squares  computations. 
In  addition,  a  parallel  scheme  for  the  calculation  of  certain  elements 
of  the  covariance  matrix  for  such  problems  is  described.  It  is  shown 

that  these  algorithms  are  ideally  suited  for  multiprocessors  with  three 
levels  of  parallelism  such  as  the  Cedar  system  at  the  University  of  Illinois. 

4.  Analysis  and  Testing  of  Fast  Recursive  Least  Squares  Filtering  Algorithm 


in  Signal  Processing  (Joint  with  T.  Alexander  and  C.  Pan) 

The  application  of  hyperbolic  plane  rotations  to  the  least  squares  down 
dating  problem  arising  in  windowed  recursive  least  squares  signal  pro¬ 
cessing  is  studied.  A  backward  error  analysis  under  some  simplifying 
assumptions  is  used  to  show  that  this  method  can  be  expected  to  perform 
well  in  the  presence  of  rounding  errors,  provided  that  the  problem  is 
not  too  ill-conditioned.  It  is  shown  in  detail  how  the  method's  stability 
depends  upon  the  conditioning.  The  results  here  contrast  with  the  recent 
error  analyses  of  downdating  methods  by  Bojanczyk,  Brent,  Van  dooren 
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and  de  Hoog,  who  suggest  mixed  rather  than  backward  stability  bounds. 
Comparisons  are  made  with  the  usual  method  based  upon  orthogonal  rotations 
as  implemented  in  LINPACK.  Both  methods  have  the  important  advantage 
over  the  classical  normal  equations  approach  in  that  they  can  be  effectively 
implemented  on  special  purpose  signal  processing  devices  requiring  shorter 
wordlengths.  However,  the  hyperbolic  rotation  method  requires  n  fewer 
multiplications  and  additions  for  each  downdating  step  than  the  orthogonal 
rotation  method,  where  n  is  the  number  of  least  squares  filter  coefficients. 
In  addition,  it  is  more  amenable  to  implementation  on  a  variety  of  vector 
and  parallel  machines.  In  many  signal  processing  applications  n  is  not 
large  and  if  n  processors  are  available,  then  the  downdating  process 
can  be  accomplished  in  2n  time  steps  by  the  hyperbolic  rotation  methods. 

III.  RESEARCH  IN  PROGRESS 

Our  research  projects  in  support  of  this  grant  which  are  currently  underway 
are  briefly  described  below.  Preprints  of  research  papers  providing  complete 
description  of  the  results  of  these  projects  will  soon  be  available. 

1 .  A  Robust  Parallel  Algorithm  for  Minimizing  a  i^eighted  Sum  of  Euclidean 
Norms  (Joint  with  S.  Wright) 

A  robust,  quadratically  convergent  parallel  algorithm  is  being 

developed  for  solving  the  nonlinear  problem 

s 

min  t  I  lb.  -  A.xi |„ 

X  •  1  1  I  c 

^  1  =1 

where  the  A^  are  m^  x  n  macrices  with  full  column  rank  n,  1  <  i  <  s. 
Applications  arise  in  facility  location  problems,  in  geodetic  adjust¬ 
ments  and  in  surface  fitting  problems.  The  algorithm  has  been  imple¬ 
mented  and  testing  is  underway  on  an  Alii  ant  FX/8  vector  mul ti processoi' 


system. 


Preconditioned  Conjugate  Gradients  by  Incomplete  Hyperbolic  Reduction 
(Joint  with  D.  Pierce) 

A  new  conjugate  gradient  algorithm  based  in  part  upon  SSOR  pre¬ 
conditioning  is  being  investigated.  The  novel  feature  of  our  approach 
is  the  use  of  stable  hyperbolic  rotation  inconiplete  factorization 
techniques  to  enhance  the  convergence  properties.  The  method  is 

being  implemented  on  an  Allianc  FX/8  and  tested  using  a  testbed  of 
structural  analysis  data. 

Geodetic  Least  Squares  Adjustment  Techniques  on  the  Cedar  System 
(Joint  with  W.  Harrod  and  A.  Sameh) 

Our  purpose  is  to  impleinent  and  test  a  parallel  block  orthogonal 
factorization  scheme  on  the  Cedar  multiprocessor  system  being  developed 
at  the  University  of  Illinois  Center  for  Supercoiiiputer  Researcn  and 
Development.  The  first  phase  of  this  project  includes  irriplementation 
on  the  Alii  ant  FX/8  system  which  will  form  the  "clusters"  for  the 

Cedar  machine.  Tests  will  be  made  using  geooetic  data  supplied  by 

the  National  Geodetic  Survey  and  by  the  Defense  Happing  Agency. 

Parallel  Algorithms  and  Experiments  for  Structural  Analysis  (Joint 
with  M.  Berry) 

The  implementation  of  direct  and  iterative  methods  for  the  solution 
of  elastic  analysis  problems  on  the  Alii  ant  FX/8  and  the  CRAY  X-MP/24, 
is  underway.  The  direct  methods  include  the  classical  displacement 
method,  the  natural  factor  method  by  Argyris,  and  weighted  least 
squares  methods  by  Van  Loan.  The  iterative  methods  include  a  pre¬ 
conditioned  conjugate  gradient  method  for  constrainted  least  squares 
problems  by  Barlow/Nichols/Plemmons  and  preconditioned  conjugate 
gradient  method  for  weighted  least  squares  equations  by  Freund. 
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Perfonnance  tests  on  the  Alii  ant  FX/8  are  being  conducted  >.0  detent.ine 
which  iiiethod(s)  is(are)  optimal  for  parallelization  and  speedup. 
Comparing  the  accuracy  of  the  force  vectors  and  the  timings  of  the 
iterative  schemes  with  the  direct  schemes  on  two-dimensional  frame 
problems,  we  can  expect  lower  execution  times  for  the  iterative  methods 
if  we  accept  force  vectors  that  yeild  high  precision  residuals  with 
lower  precision  quadratic  forms. 

While  the  performance  of  the  two  conjugate  gradient  scheines 
are  approximately  the  same,  the  Barlow/Nichols/Plemmons  scheme  is 
more  advantageous  in  the  fact  that  no  weighting  of  the  equilibriunt 
matrix  is  required.  Some  of  the  parallel  algorithms  that  are  being 
experimenetly  used  in  all  these  method  include  block  Cholesky 
factorization,  block  Householder  QR  factorization,  ana  pipelined-Givens 
reduction.  Comparisons  in  speed  with  the  appropriate  routines  from 
UNPACK  are  also  being  made.  The  vectorization  potential  of  the 
methods  is  being  determined  oy  the  implementations  on  the  CRAY  X-hP/kiA. 
All  results  thus  far  are  preliminary  and  further  code  revisions  are 
necessary  for  both  the  Alliant  FX/&  and  the  CRAY  X-hP/24.  The  results 
of  this  short-term  project  will  be  described  in  a  paper  which  will 
be  presented  at  the  First  World  Congress  on  Computational  Mechanics 
in  Austin,  Texas  in  Septeuiber. 

IV.  TECHNICAL  PUBLICATIONS 

1.  "A  parallel  block  iterative  scheme  applied  to  computations  in  structural 
analysis",  SlAi’^i  0.  Alg.  and  Disc,  i^ieth.,  7(1986),  337-347. 

2.  "Convergent  iterations  for  computing  stationary  distributions  of  Markov 
chains",  SIAM  u.  Alg.  and  Disc.  Meth.,  7(1966;,  390-398  (with  G.  Barker;. 
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3.  "An  algorithm  to  compute  a  sparse  basis  of  the  null  space",  Nuiiieri sche 

Math.  4711985),  483-504  (with  M.  Berry,  M.  Heath,  1.  Kaneko,  M.  Lawo 
and  R.  Ward). 

4.  "updating  LU  factorizations  for  computing  stationary  distributions", 

SIAM  J.  Alg.  Disc.  Meth.,  7ll586j,  30-42  (with  R.  Funderlic). 

5.  "Convergence  of  parallel  multisplitting  iterative  methods  for  h-inatrices" , 
to  appear  in  Lin.  Alg.  and  Applic.  (with  M.  lieuniann). 

6.  "A  conjugate  gradient  method  for  che  solution  of  equality  constrained 

least  squares  problems",  to  appear  in  the  Proc.  of  the  1986  SPIE  Conf., 
San  Diego,  CA  (with  J.  Barlow  and  N.  Nichols). 

7.  "Parallel  block  schemes  for  large  scale  least  squares  computacions" , 

to  appear  in  the  Proc.  of  the  Workshop  on  Scientific  Applic.  and  Alg. 

Design  for  high  Speed  Computing,  Urbana,  IL  (with  G.  Golub  ana  A.  Sanieh). 

8.  "Analysis  of  a  recursive  least  squares  hyperbolic  rotation  algorichi.i 

for  signal  processing",  submitted  to  the  Lin.  Alg.  and  Applic.  Special 
Issue  on  Applic.  to  Elec.  Eng,  (with  5.  Alexander  and  C.  Pan;. 

V.  PERSONNEL  ASSOCIATED  WITH  THE  RESEARCH  EFFORT 

R.  o.  Plernmons,  Principal  Investigator  (1  luo.  suiniiier,  1  mo.  academic  year) 

R.  B.  Mattingly,  GRA  (Hime) 

D.  J.  Pierce,  GRA  (J  tifi.e),  Ph.D.  expected  Fall  198G  or  Spring  1987. 

VI.  CONFERENCE  AND  COLLOQUIUM  ACTIVITIES 

1.  Colloquium  Lecture  -  "Parallel  algorithms  in  structural  analysis". 

University  of  Virginia,  October  1985. 

2.  Contributed  Lecture  -  "Parallel  algorithii.s  for  least  squares  problems  in  dual 
angular  form",  SI  Am  Conference  on  Parallel  Processing,  Norfolk,  Virginia, 


November  1985  (Presented  by  u.  Pierce;. 


3.  Contributed  Lecture  -  "Some  parallel  algorithms  for  matrix  structural  analysis' 
SIAM  Conference  on  Parallel  Processing,  Norfolk,  Virginia,  November  1985. 

4.  Invited  Lecture  -  "Multi spl itting  parallel  iterative  methods",  NSF  Conference 
on  Matrix  Theory,  Auburn,  AL,  March  1986. 

5.  Invited  Panelist  -  "Geodetic  computations".  Workshop  on  Scientific  Applications 
and  Algorithm  Design  for  High  Speed  Computation,  Urbana,  IL,  April  1986. 

6.  Colloquium  Lecture  -  "Parallel  iterative  metnods".  Wake  Forest  University, 

April  1986. 

7.  Invited  Lecture  -  "Analysis  of  a  fast  recursive  least  squares  filtering 
algorithm  in  signal  processing".  Workshop  on  Communications  and  Signal 
Processing,  Raleigh,  NC,  May  1986. 

8.  Conference  Organizing  Committees  - 

(a)  SIAM  Conference  on  Linear  Algebra  in  Signals,  Systems  and  Control, 

Boston,  MA,  August  1986. 

(b)  Invited  Special  Session  -  "Advances  in  Parallel  Processing",  World 
Congress  on  Computational  Mechanics,  Austin,  TX,  September  1986. 

(c)  Third  SIAM  Conference  on  Applied  Linear  Algebra,  Madison,  WI,  May  1987. 

VII.  SUMMARY 


To  summarize,  the  activities  described  in  this  Interim  Annual  Report 
represent  our  efforts  to  develop,  analyze  and  test  fast  algorithms  for 
structural  analysis  and  least  squares  problems.  Special  features  of  the 
problems  are  being  addressed  and  implementations  are  being  made  on  modern 
high  performance  architectures  such  as  the  CRAY  2,  CRAY  X-MP  and  Alliant 
FX/8  multiprocessors. 


